Segmentation-Based And Segmentation-Free Methods for Spotting Handwritten Arabic Words
نویسندگان
چکیده
Given a set of handwritten documents, a common goal is to search for a relevant subset. Attempting to find a query word or image in such a set of documents is called word spotting. Spotting handwritten words in documents written in the Latin alphabet, and more recently in Arabic, has received considerable attention. One issue is generating candidate word regions on a page. Attempting to definitely segment the document into such regions (automatic segmentation) can meet with some success, but the performance of such an algorithm is often a limiting factor in spotting performance. Another approach is to directly scan the image on the page without attempting to generate such a definite segmentation. A new algorithm for word spotting and a comparison of recent algorithms which act on previously unsegmented Arabic handwritten text is presented. The algorithms considered are an automated word segmentation method presented previously and a “segmentation free” algorithm which performs spotting directly on lines of unsegmented text. The segmentation free approach performs spotting and segmentation concurrently using a sliding window. The spotting method used to judge the performance of the algorithms is a character based method, but the results are independent of the actual spotting method used. The segmentation-free method performs an average of 5-10% better than the automated segmentation method, and manages to have a lower per query cost on unprocessed images. However, it has a larger per query cost on preprocessed documents.
منابع مشابه
Connected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کاملRadial Line Fourier Descriptor for Segmentation-free Handwritten Word Spotting
Automatic recognition of historical handwritten manuscripts is a daunting task due to paper degradation over time. Recognition-free retrieval or word spotting is popularly used for information retrieval and digitization of the historical handwritten documents. However, the performance of word spotting algorithms depends heavily on feature detection and representation methods. Although there exi...
متن کاملSegmentation-free Word Spotting for Handwritten Arabic Documents
6 Abstract — In this paper we present an unsupervised segmentation-free method for spotting and searching query, especially, for images documents in handwritten Arabic, for this, Histograms of Oriented Gradients (HOGs) are used as the feature vectors to represent the query and documents image. Then, we compress the descriptors with the product quantization method. Finally, a better representati...
متن کاملWord Spotting in Handwritten Arabic Documents Using Bag-Of-Descriptors
This paper presents a query-by-example word spotting in handwritten Arabic documents, based on Scale Invariant Feature Transform (SIFT), without using any text word or line segmentation approach, because any errors affect to the subsequent word representation. First the interest points are automatically extracted from the images using SIFT detector, then, we use SIFT descriptor to represent eac...
متن کاملComponent-based Segmentation of Words from Handwritten Arabic Text
Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words se...
متن کامل